3D Coding TAT - Render Attribute Trees (Rats)
This is an idea I've had for some time. It was developed after a number of attempts to think of a way to reduce the amount of polygons (and CPU workload) in a more global way rather than the usual one-by-one basis. Even when you use a quick hidden surface check there is still an incredible amount of CPU work to do. If a level is made up from many 1000's of polygons/surfaces then what we want is to quickly reduce this large number down to 100's or less.
1: Groups and Trees
The idea of a "group" (or more correctly a "hierarchical") approach may hold the secret to new fast renderers. This is where I feel most time could be devoted when writing a 3d engine. EVERY graphic book I have seen (2d and 3d) focus their eye on the individual components rather than the whole picture. The few good books usually give you a good description of the 'atomic-level' of a 3d engine, likes lines, polygons and textures if you are VERY lucky. And in the closing chapters may say something like "armed with the previous building-blocks it should be relativity easy to produce a full blown game etc..." With the exception of perhaps BSP (Binary Space Partion) tree structures there is very little said the larger issues. There is a large amount of work just getting a filled polygon on the screen, rotating it, etc.. so sometimes the low level of the polygon is where they stop. It's a bit like describing a tree in terms of individual leaves. Now imagine an entire forest and count all those leaves!!!
All levels and objects are not made from existential polys but do share a large amount of relative or exclusive properties. (2 Polygons share an indentical edge, the front excludes the back, the left excludes the right and so on...) The only time when the subject of front and back is used when describing hidden surface removal or possibly shading. I would say 80% or more of 3d objects are rigid (walls do not buckle, floors do not fold inside out and most computer people do not sheer) so objects and their components can be thought of as being made up from uncrushable polyhedron (a cube for a head, tall boxes for legs or narrow coffins for doors etc.). Given this information it should be possible to devise some hierarchical/tree based system to describe objects AND entire levels. To put it simply, we want to discard entire rooms and their contents and only render the local stuff which relates to our view point & direction in the 3d world.
NOTE: I'm not talking about BSP (Binary Space Parition) tree structures, but some form of hashing/rendering attribute search.
A good example of this is a goldfish inside a bowl inside a room inside a certain level inside a building, etc. You wouldn't want to draw the goldfish and all its polygons every time would you? The most obvious method is to use a 'contents' list for each parent structure. It could look like this:
3d world
/ \
/ \
/ \
BUILDING2 BUILDING1
/\
/ \
/ \
level 2 level 1
/\
/ \
/ \
/ \
/ \
room 2 room 1
/ \
/ \
/ \
/ \
table goldfish bowl
\
\
\
\
goldfish
I know the above diagram might look similar to BSP trees but I'm talking about something completely different. A BSP tree would place individual polygons at each tree node (branch) or at each tree leaf. This "RAT" (Render Attribute Tree) method sub-divides the entire 3d environment down into smaller and smaller regions where the topmost 'root' node is the entire 3d world (or universe) and each level down the tree is a sub-region contained within its parent region. The "RAT" method in this case has been used to describe a Russian Dolls approach (smaller things inside larger ones). So the attribute by which the tree is navigated would be a 'contents' or 'inside' one, but other attributes could be used instead like direction from the view-point. By beginning somewhere in the tree we can easily discard very large sections of the 3d environment. Also we can limit the how far down the tree we traverse when rendering a scene (we only bother with the drawing rooms if the building itself is visible).
The basic idea is to specify a search attribute or guide which employs a lost-cost CPU calculation or rule that fragments our 3d environment into small, CPU friendly chunks. Another attribute could be the INCLUSIVE or EXCLUSIVE properties of convex polyhedrea where a front (facing us) polygon excludes the back face. In the case of 3d levels entire rooms could be included or excluded depending on whether a door or window was visible (and open of course).
This tree diagram should remind you of the directory structure used in DOS or Windows etc. The great advantage of it is the ability to discard large portions of items quickly and correctly. For example if we were in "room2" then we only need to draw a table as no other items share the same parent ("room2"). Likewise if we were on level2 then room2, room1, the table, goldfish would be ignored.
A separate "render attribute tree" would be pre-built for each rigid object or level (by the object/level editor). Now the engine can use the viewpoint and object orientation to create a number of "sieve" parameter(s) which are used to guide it through the render tree more quickly than checking each and every polygon. At the start the entire object must be considered as possibly being visible, then after a polygon or so has been classified as hidden or visible the tree kicks in. If this polygon is visible then it must be now be at the front (facing the view plane). We can now reject ALL the polygons on the opposite side to it OR we could accept those polygons which share its direction. In essence we use the pre-defined relationships between the various components of a object/level to group them together. In the simplest example, a cube, the front would exclude the back, the top excludes the bottom, the left the right, the bottom the top and so on. It's as simple as that! One implementation could be to create a polygon list for every possible viewing angle, but this would be far too expensive in terms of memory. In effect this list would be a lazy form of hidden surface removal. As most engines use far more then 360 degrees for each of the viewing angles (X, Y and Z) this adds up to a whole heap of RAM, so is useless.
Imagine the below shape folded into a cube with its 6 faces (a to f). Cut out this shape to form a cube or try using a lightbulb box and label the faces a to f.
e.g. Ú---¿
³ a ³
Ú---Å---Å---¿
³ e ³ b ³ f ³
flattened cube ---> À---Å---Å----
³ c ³
Ã---´
³ d ³
À----
We could create the following EXCLUSION lists.
Face EXCLUDE
------ ---------
a c
b d
c a
d b
e f
f e
If we take face b as being the front and say that it's visible then it would exclude face d (because you can't see the front and the back at the same time). We check face a, if that's visible then exclude face c. This continues until all the faces have been checked or excluded. Of course if the first face b had been hidden then the excluded items would be need to be inverted to become inclusions instead, so making face d visible. The exclusion of faces could easily be done using a bytemap or bitmap flag system to denote either HIDDEN or VISIBLE polygons/surfaces. This flag map would first need to be cleared before each render cycle. This extra CPU time would be reclaimed by the savings in the hidden surface checks which often use a few multiplications. Where as an INCLUDED/EXCLUDED face would require a true/false flag test. You might have to employ a 3rd state (UNSURE) to describe each polygon face. Whenever this state is found the polygon is checked with the normal hidden-surface test (back-face culling) then it could include/exclude other polygons. The very first polygon should be marked as UNSURE to force a visibility check.
It should be easy to see that the more complex the object the greater the saving 'should' be. In the cube example only a single face/polygon made up the exclude/include list but for far more detailed objects like an open box for example more than one face/polygon can be rejected or accepted this way.
Efficient data/level definitions can reduce all the repetitive calculation and data processing that most other "one-by-one" programs/3d engines use. Optimizing a level rendering algorithm should produce a greater performance gain than optimizing the heavily used polygon scan-converting code, simply because the fewer surfaces passing through the polygon drawer the better. Another area worth spending time on is the clipping/surface rejection. Again the smaller the number of polygons that you need to process and draw, the faster your 3d engine will run.
2: Freedom vs. Scoped Lists
One of the advantages of using a 3d based environment is that the player can explore and view from an almost infinity number of angles and places. But this also means that the graphics engine must cope with a great deal of flexibility. The kind of game in which a 3d engine is going to be used should dictate the level data structures and rendering technique used. For example a racing or track based game such as Screamer, Motorhead or any other of the latest curcuit driven ones are much easier to program in terms of their level design than a more general and flexible exploration engine like that used in Quake or Unreal. By using a track based game engine it is possible to design a level in a linear way using nothing more complex than a circular list. Even the few tracks which allow the player to choose between two possible paths can be modelled in this way. All that is needed is another list which can followed instead of the normal curcuit's path, then at the end of this list it can be pointed back to rejoin the curcuit again.
Because a track has only one start and one finish point we can apply quite a lot of pre-processing to it. We can store the track as a tokenized list of buildings, sign-posts, road types and other hazards. This means that as the player's vehicle moves along the track we can use a tail and head approach to guide the engine to render only a very small section from a curcuit instead of processing the entire thing in one go. The 'tail' could be our view-point and the 'head' could be the distant horizon. So any building, object or polygon behind our view or further then the horizon is ignored. If a track is defined in a building-block way using a list of world coordinates and object tokens (a bit like the planet method) then as each building/structure comes into the visible body-section of the track (between head and tail) we can transform the vertices and draw the polygons. This way we are not transforming unused vertices or processing polygons on the other side of the track. I will call this method of rendering the "scope" method as only a very small region of a level (curcuit) is processing at any one time. In essence the renderer is fitted with blinkers, this prevents it from seeing too much of the map and so speeds it up.
Games such as Quake, Quake 2 and Unreal are unable to employ this "scope" method because their environments are interconnected on a far, far greater scale than any racing game I have seen (so far). They do not conform to a simple looping curcuit which has a single path to follow along the twisting, linear track but have a web of connected rooms, sub-rooms and open areas. This means the player can choose to follow any of these possible paths and not just drive forwards or backwards in their vehicle. This freedom is good for the player but bad for the programmer. As I have said elsewhere in this large document CPU's really like sequential data and dislike parallel or branching data structures because some form of selection process is needed to direct an algorithm along a certain pathway and this takes time. The advantages of using a "scoped list" can not easily (if at all) be applied to these freedom based environments and when attempted can cause if not coding headaches, large storage problems.
As there are a vast number of possible paths through these highly interconnected environments there is a vast number of scoped-lists to represent each one. Their paths will cross in many places where one list runs north-south and another runs east-west, of course there can be many more directions than the 4 compass points. Also as current gaming environments have taken the step from a 2d based map (i.e. Doom, Wolfenstein) to a true 3d one, they need to able to handle vertically joined areas (rooms upon rooms for example). So instead of a 1d linear list we have something like a 3-dimensional web. To call this tricky is a slight understatement.
3: Environment Webs & Plants
It might first seem that a true 3d environment must be defined and processed in its entirity because of the immense number of possible viewing angles and view points, but is it possible to use a number of "plants". Instead of constructing an entire world one polygon at a time, we can place buildings, structures and various other natural items on the landscape. So given a list of world positions and a building we could easily draw a whole city or town. Place a few trees and a large, blue, wobbly lake and you have Milton Keynes (heh heh). This "Plant" idea is NOT the same as the "Planet" one, because the planets can move, but plants can not (they are static in the game world) unless they are Triffids (Eck!).
By using just the building's/structure's world coordinates (which is normally the object's origin) we only have to deal with a set of (X,Y,Z) coordinates. Then afterwards when we come to draw one of these environmental object models we can rotate, transform and project all its vertices around this world position. And yet again the hierarchial data structure appears in this object placement method of describing a scene. These world (X,Y,Z) coordinates can be thought of as the parent origin vertices with the object model's vertices being their siblings. Now if we can reject the world (X,Y,Z) position (too far away or behind the view-point) then we can reject the whole object model too. We have "pruned the plants".
4: OCT-Trees
It would be nice to process only some of these world (X,Y,Z) model origins to speed things up even further. Rather than rotate and check each one we would like to be able to scan the local, surrounding areas and choose these within range. We could use a 2d or 3d array map to divide the world into smaller regions (see Grids and Coarse Area Nets). But I will attempt to describe another method, that of "3d Linked-Lists" or "Environment Webs". The idea is simple, for every world (X,Y,Z) position create a multi-directional node and use it point to it's surrounding neighbours. Within the data structure of a node there are a number of pointers or 'links' which are used to direct the renderer to the closest neighbouring object in a particular direction. A normal linked-list is a 1-dimensional version of this, it directs a program left-to-right or north-to-south (etc.) through a sequential list. A 2-dimensional linked-list (or should that be linked-array?) can guide a program in four directions (north, south, east, west) this is suitable for a 2d map or a 3d map, but a 3-dimensional "web" (linked-list) would perhaps be better as we could follow up and down links aswell.
I believe "Oct-Trees" are 8 sided trees (I guess) and are used to describe the relative placement of neighbouring items with respect to the current one. I haven't seen ANY info about these structures so please read this with a pinch of salt. It is probable that they only describe the placement of items in a 2-dimensional way.
e.g. NW N NE
\ ³ /
\ ³ /
\³/
W -----Å----- E
/³\
/ ³ \
/ ³ \
SW SE
S
5: Spinning the Web
Now there is the problem of creating this 3-dimensional "web". Given just a list of world (X,Y,Z) coordinates how can we build the links between items? If these items can be placed any at position along these three axises it means that any one item can be at any angle or any distance to any other. So how many links do we need? Do we need 360 for the X-Z plane and another 360 for the Y-Z plane for example?
Well it would be very inpractical to use 720 links for every plant-node as most of these will be unused, so I suggest using just 6 links (like the face of the worn out cube example).
Starting at a particular item in the list we can allocate a node structure and then search for the closest item to it. Its direction is also found and used to select one of the possible node's link fields.
6: Caught in the Web
But this 3d web suffers from the same problem as navigating a path through a tree structure, that of having to select between a number of node links. Because the direction which each link represents won't preciecely match our rendering direction it is possible that the renderer will be forced to choose between two close paths. Suppose that we only had 4 links from each node to choose from and each link was at 90 degrees to each other.
e.g. 90' / desired direction
³ /
³ /
³/
180' -----Å----- 0'
³
³
³
270'
Now we want to follow a 45 degree path, so which path link should be used? Remember that these north, south, east, west links point to the closest item which is probably NOT absolutely north, south, east or west of our current one. So it might be at 80 degrees or 101 degrees instead of 90 degrees.
Another problem is that you may take what you believe is the best route link but may in fact have to back-track in the opposite direction to locate the item you want. This can cause you to revisit the same item twice or any number of times.
7: Multi-Attribute Sieves
The main characteristic about a 3d engine is its flexibility and this is sadly also its performance hurdle. For a scrolling 2d game there are at most four possible directions to move in and only a very small, rigid window over the game world to draw. These are very linear, we know what part of the game world needs to be updated and how it should look. In most cases the most difficult part is moving or scrolling the screen bitmap which can be done using the hardware panning registers and then redrawing the new edges of the map on the screen.
In the case of 3d the rotation and perspective means that neighbouring pixels and surfaces do not have any local coherence which can be to exploited, we can't simply pan the screen and draw a new edge. Polygons change in size, angle, shape and direction so everything must be recalulated in real time. Most/all 3d map levels have an interconnected structure to them this means there is no one linear way to navigate through them.
It would be nice to just specify the view angles, view position and be given the resulting polygon list from a pre-defined data structure of the level. But this is sadly not possible. As far as I know data structures such as lists, trees and arrays can only be used for one, single purpose. For example a polygon list build for one particular direction will not work for another direction (except perhaps a 180 degree turn where the list is scanned backwards).
The renderer is a form of multi-attribute sieve.
8: Recommended Reading
Most other 3d docs list a number of 3d programming books which they recommend, but I cannot, because I don't have a single 3d programming book! Most of my limited knowledge has been figured out the slow, painful way (late nights & aspirin).
You may think that your code is breaking new ground, pushing back the frontiers of programming, but in reality a thousand other programmers have done it years before you. If you ever need an Ego-check then visit your library and take out a dusty 1970's book on algorithms or computer graphics and soon you will be saying, "Wow, this geezer was a smart cookie before I was even born!!"
You won't find the ultimate 3d engine in a book, but in your head. The problem is finding a suitable download lead.
Have fun ...